A new artificial intelligence model, DeepSeek-R1, is introduced, demonstrating that the reasoning abilities of large language models can be incentivized through pure reinforcement learning, removing the need for human-annotated demonstrations.
A novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks.
The SCARE 2025 guideline provides an up-to-date framework for surgical case reports in the era of AI and adds specific reporting criteria for AI to ensure that any use of artificial intelligence in a case report is clearly documented, explained and discussed including with respect to bias and ethics.
A guideline to transparently reporting the use of AI in any manuscript in general is presented and will evolve over time as technology, systems and behaviour evolve.
The new AlphaFold model demonstrates substantially improved accuracy over many previous specialized tools: far greater accuracy for protein–ligand interactions compared with state-of-the-art docking tools, much higher accuracy for protein–nucleic acid interactions compared with nucleic-acid-specific predictors and substantially higher antibody–antigen prediction accuracy.
This work improves existing noise sampling techniques for training rectified flow models by biasing them towards perceptually relevant scales and presents a novel transformer-based architecture for text-to-image generation that uses separate weights for the two modalities and enables a bidirectional flow of information between image and text tokens.
This System Card provides a detailed look at GPT-4o's capabilities, limitations, and safety evaluations across multiple categories, focusing on speech-to-speech while also evaluating text and image capabilities, and measures the authors've implemented to ensure the model is safe and aligned.
OpenVLA, a 7B-parameter open-source VLA trained on a diverse collection of 970k real-world robot demonstrations, is introduced and it is shown that it can effectively fine-tune OpenVLA for new settings, with especially strong generalization results in multi-task environments involving multiple objects and strong language grounding abilities.
Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters, delivers the best performance for their size, and even offers competitive alternatives to models that are 2-3 times bigger.
This work introduces Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model that vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks and provides a model fine-tuned to follow instructions, Mixtral 8x7B - Instruct, that surpasses GPT-3.5 Turbo, Claude-2.1, Gemini Pro, and Llama 2 70B - chat model on human benchmarks.
Recent improvements to Job Dispatcher are overviews, including its brand new website and documentation, enhanced visualisations, improved job management, and a rising trend of user reliance on the service from low- and middle-income regions.
The introduction is organized in a unique didactic manner developed by the authors, starting from more simple concepts such as linear programming and single-point methods, and advancing from these to more difficult concepts such as optimality conditions for nonlinear optimization and set-oriented solution algorithms.
This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models, and presents comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development.
OLMo is built, a competitive, truly Open Language Model, to enable the scientific study of language models and it is hoped this release will empower the open research community and inspire a new wave of innovation.
Two below-threshold surface code memories on Willow, a distance-7 code and a distance-5 code integrated with a real-time decoder, indicate device performance that, if scaled, could realize the operational requirements of large-scale fault-tolerant quantum algorithms.
With an improved framework for model development and evaluation, a large language model is shown to provide answers to medical questions that are comparable or preferred with respect to those provided by human physicians.
Compared to current editing models that exhibit degradation in character consistency and stability across multiple turns, it is observed that FLUX.1 Kontext improved preservation of objects and characters, leading to greater robustness in iterative workflows.
The development of TRIPOD+AI is described and the expanded 27 item checklist with more detailed explanation of each reporting recommendation is presented, and the TRIPOD+AI for Abstracts checklist is presented.
The Cosmos World Foundation Model Platform is presented to help developers build customized world models for their Physical AI setups and position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications.
This second iteration of SigLIP 2 introduces SigLIP 2, a family of new multilingual vision-language encoders that build on the success of the original SigLIP, and extends the original image-text training objective with several prior, independently developed techniques into a unified recipe.
The results show the significant potential of AI in personalizing learning, automating routine tasks, and providing access to knowledge, but also reveal serious risks of exacerbating social inequality and ethical dilemmas.
This paper presents a fundamental algorithm for parsing natural language sentences into dependency trees that operates one word at a time, attaching each word as soon as it can be attached, corresponding to properties claimed for the parser in the human brain.
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2, a large model that significantly outperforms other models of comparable size and makes the model weights available under an OpenRAIL license.
The model, called CUT3R (Continuous Updating Transformer for 3D Reconstruction), captures rich priors of real-world scenes: not only can it predict accurate pointmaps from image observations, but it can also infer unseen regions of the scene by probing at virtual, unobserved views.
The development and implementation of DNA barcoding for the Darwin Tree of Life Project (DToL), which aims to sequence and assemble high quality reference genomes for all eukaryotic species in Britain and Ireland, is described.
This work builds the first Multi-Agent System Failure Taxonomy (MAST), a comprehensive dataset of 1600+ annotated traces collected across 7 popular MAS frameworks, and develops an LLM-as-a-Judge pipeline with high agreement with human annotations to enable scalable annotation.
A novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time is studied.
Extensive evaluation shows that Kimi-Audio achieves state-of-the-art performance on a range of audio benchmarks including speech recognition, audio understanding, audio question answering, and speech conversation.
Aurora, a large-scale foundation model trained on more than one million hours of diverse geophysical data, outperforms operational forecasts in predicting air quality, ocean waves, tropical cyclone tracks and high-resolution weather, all at orders of magnitude lower computational cost.
It is shown that language models trained at scale on evolutionary data can generate functional proteins that are far away from known proteins, and ESM3, a frontier multimodal generative language model that reasons over the sequence, structure, and function of proteins is presented.
PaliGemma is an open Vision-Language Model that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model that achieves strong performance on a wide variety of open-world tasks.
This work introduces MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B that demonstrates advanced medical understanding and reasoning on images and text, significantly exceeding the performance of similar-sized generative models and approaching the performance of task-specific models, while maintaining the general capabilities of the Gemma 3 base models.
The conversational diagnostic artificial intelligence system AMIE (Articulate Medical Intelligence Explorer) has potential as a real-world tool for clinical history-taking and diagnostic dialogue, based on its performance in simulated consultations.
Transparent reporting of a multivariable model for individual prognosis or diagnosis–large language model TRIPOD-LLM is a checklist of items considered essential for good reporting of studies that are developing or evaluating an LLM for use in healthcare settings, a ‘living guideline’ that emphasizes transparency, human oversight and task-specific performance reporting.
Approaches for the development of future at-scale neuromorphic systems based on principles of biointelligence are described, along with potential applications of scalable neuromorphic architectures and the challenges that need to be overcome.
The STROCSS 2025 guideline provides an up-to-date framework for surgical observational studies in the era of AI and adds specific reporting criteria for AI to ensure that any use of artificial intelligence in a surgical observational study is clearly documented, explained and discussed including with respect to bias and ethics.
A proof-of-principle study reports a complete photonic quantum computer architecture that can, once appropriate component performance is achieved, deliver a universal and fault-tolerant quantum computer.
This study reviews the techniques and tools used for automatic disease identification, state-of-the-art DL models, and recent trends in DL-based image analysis, and evaluates various DL architectures, providing guidance on the suitability of these models for production environments.
This work introduces Tulu 3, a family of fully-open state-of-the-art post-trained models, alongside its data, code, and training recipes, serving as a comprehensive guide for modern post-training techniques.
Molmo is presented, a new family of VLMs that are state-of-the-art in their class of openness, with a novel, highly detailed image caption dataset collected entirely from human annotators using speech-based descriptions.
Article Galaxy Pages is a free service from Research Solutions, a company that offers access to content in collaboration with publishing partners, online repositories and discovery services.